-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Small performance optimization when fetching items from library. #2798
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for looking into this! It does seem like a much better idea to fetch everything up front instead of looping over each item and then issuing a separate query. 👍 👍 👍
I have a couple of questions within, including a follow-up on your concerns about the list of IDs in the SELECT query.
Have you run any simple performance comparisons (just time
on a query or two) to see what the impact is, even if it didn't solve the main bottleneck?
Finally had some time to rerun my performance tests under at least somewhat untainted circumstances. Without the patch
With the patch
Both are fastest runs out of 5. Based on this I'd say on average it gives a 4% speedup. Edit: Forgot to mention, a lof of that time seems to be in the actual output generation. Loading the entire collection in roquette takes about 3s, so about 66% less. Given that, the 0.4s speedup doesn't sound that bad actually. |
Indeed! Thanks for doing the measurements—that's really helpful to know. And it helps point the way to the next bottleneck too (formatting the output). |
Finally had time to do a revision on this. I moved the flex attr query to I did some further measurements, to get the actual raw performance of iterating the entire library (11832 items): start = time.time()
for item in lib.items(query):
continue
end = time.time() On master this takes I also see some unit test failures in travis that I don't really understand (I don't get those locally), so if someone has some pointers on what to do about those I'd really appreciate it. |
Nice!! I agree this is looking good—I'm convinced this is the way to go, and the code changes are nice and clean. Awesome work!!! I'm also pretty confused by the test failures. I am able to reproduce them locally (for example, by running |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I think I see the problem with those tests. The issue arises when querying for model objects that have no flexattrs. In this case, _get_indexed_flex_attrs
does not have an entry for the object (even an empty one). So trying to look up its flexattrs raises a KeyError.
I've proposed an easy fix in an above inline comment. One more complicated fix would be to change _get_indexed_flex_attrs
to produce empty entries for other model objects, but that's probably not a good idea.
To reason this comes up in the tests for metasync
, for what it's worth, is the way the harness sets up its fixtures:
Lines 62 to 71 in 17d9882
items[0].title = 'Tessellate' | |
items[0].artist = 'alt-J' | |
items[0].albumartist = 'alt-J' | |
items[0].album = 'An Awesome Wave' | |
items[0].itunes_rating = 60 | |
items[1].title = 'Breezeblocks' | |
items[1].artist = 'alt-J' | |
items[1].albumartist = 'alt-J' | |
items[1].album = 'An Awesome Wave' |
There's one item with a flexattr and one without.
Thanks for the explanation. All makes sense now :) |
Hey! Just checking, this is good to go right? If so I'll go ahead and merge it |
Yes; sorry for not responding! This looks good to go. It's a small change but a very important one! I say merge away. |
Fetch flexible attributes once for all items instead of once per item. For queries with larger result sets this drastically cuts down the number of issued sqlite queries.
This resolves the query length and potential security problems, while keeping the performance benefits.
Fetch flexible attributes once for all relevant items instead of once per item.
For queries with larger result sets this drastically cuts down the number of issued sqlite queries.
In the end, this didn't quite give the performance boost I'd hoped it would, but it feels still slightly faster for larger queries.
Unfortunately the hot spot seems to be somewhere else though :-/
One thing to note, since the list of IDs used in the WHERE..IN clause can be quite large (and reach sqlite's query parameter limit), I resorted to not passing them as parameters.
This should be fine, since these are not values originating from input but coming from the database from a previous query. Not sure how big of an issue this is nonetheless.